AITopics | prediction strategy

Collaborating Authors

prediction strategy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Near-optimal Swap Regret Minimization for Convex Losses

Hu, Lunjia, Schneider, Jon, Wu, Yifan

arXiv.org Machine LearningFeb-10-2026

We give a randomized online algorithm that guarantees near-optimal $\widetilde O(\sqrt T)$ expected swap regret against any sequence of $T$ adaptively chosen Lipschitz convex losses on the unit interval. This improves the previous best bound of $\widetilde O(T^{2/3})$ and answers an open question of Fishelson et al. [2025b]. In addition, our algorithm is efficient: it runs in $\mathsf{poly}(T)$ time. A key technical idea we develop to obtain this result is to discretize the unit interval into bins at multiple scales of granularity and simultaneously use all scales to make randomized predictions, which we call multi-scale binning and may be of independent interest. A direct corollary of our result is an efficient online algorithm for minimizing the calibration error for general elicitable properties. This result does not require the Lipschitzness assumption of the identification function needed in prior work, making it applicable to median calibration, for which we achieve the first $\widetilde O(\sqrt T)$ calibration error guarantee.

artificial intelligence, machine learning, swap regret, (18 more...)

arXiv.org Machine Learning

2602.08862

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(2 more...)

Genre: Research Report (0.70)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.95)
Information Technology > Game Theory (0.68)

Add feedback

Training LLMs Beyond Next Token Prediction -- Filling the Mutual Information Gap

Yang, Chun-Hao, Feng, Bo-Han, Lai, Tzu-Yuan, Chen, Yan Yu, Huang, Yin-Kai Dean, Lin, Shou-De

arXiv.org Artificial IntelligenceNov-4-2025

Optimizing training performance in large language models (LLMs) remains an essential challenge, particularly in improving model performance while maintaining computational costs. This work challenges the conventional approach of training LLMs using next-token prediction (NTP), arguing that by predicting information-rich tokens during training, there is a more effective way to train LLMs. We investigate the impact of the proposed solution in three kinds of tasks for LLMs: arithmetic, multi-label classification of text, and natural-language generation. This work offers a principled approach to optimizing LLM training, advancing both model performance and theoretical understanding of the target-token selection strategies.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2511.00198

Country: Asia > Taiwan (0.40)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Monitoring State Transitions in Markovian Systems with Sampling Cost

Saurav, Kumar, Shroff, Ness B., Liang, Yingbin

arXiv.org Machine LearningOct-28-2025

We consider a node-monitor pair, where the node's state varies with time. The monitor needs to track the node's state at all times; however, there is a fixed cost for each state query. So the monitor may instead predict the state using time-series forecasting methods, including time-series foundation models (TSFMs), and query only when prediction uncertainty is high. Since query decisions influence prediction accuracy, determining when to query is nontrivial. A natural approach is a greedy policy that predicts when the expected prediction loss is below the query cost and queries otherwise. We analyze this policy in a Markovian setting, where the optimal (OPT) strategy is a state-dependent threshold policy minimizing the time-averaged sum of query cost and prediction losses. We show that, in general, the greedy policy is suboptimal and can have an unbounded competitive ratio, but under common conditions such as identically distributed transition probabilities, it performs close to OPT. For the case of unknown transition probabilities, we further propose a projected stochastic gradient descent (PSGD)-based learning variant of the greedy policy, which achieves a favorable predict-query tradeoff with improved computational efficiency compared to OPT.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

2510.22327

Country:

North America > United States > Ohio > Franklin County > Columbus (0.04)
Asia > India (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing

Ma, Haiyue, Du, Zhixu, Chen, Yiran

arXiv.org Artificial IntelligenceJun-10-2025

In multi-GPU Mixture-of-Experts (MoE) network, experts are distributed across different GPUs, which creates load imbalance as each expert processes different number of tokens. Recent works improve MoE inference load balance by dynamically duplicating popular experts to more GPUs to process excessive tokens, which requires predicting the distribution before routing. In this paper, we discuss the tradeoff of prediction strategies, accuracies, overhead, and end-to-end system performance. We propose MoE-GPS, a framework that guides the selection of the optimal predictor design under various system configurations, by quantifying the performance impact to system-level model runtime. Specifically, we advocate for Distribution-Only Prediction, a prediction strategy that only predicts overall token distribution which significantly reduces overhead compared to the traditional Token-to-Expert Prediction. On Mixtral 8x7B MMLU dataset, MoE-GPS suggests Distribution-Only Prediction which improves end-to-end inference performance by more than 23% compared with Token-to-Expert Prediction.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.07366

Genre: Research Report > New Finding (0.68)

Industry: Energy > Power Industry (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

Autoregressive Generation Strategies for Top-K Sequential Recommendations

Volodkevich, Anna, Gusak, Danil, Klenitskiy, Anton, Vasilev, Alexey

arXiv.org Artificial IntelligenceSep-26-2024

The goal of modern sequential recommender systems is often formulated in terms of next-item prediction. In this paper, we explore the applicability of generative transformer-based models for the Top-K sequential recommendation task, where the goal is to predict items a user is likely to interact with in the "near future". We explore commonly used autoregressive generation strategies, including greedy decoding, beam search, and temperature sampling, to evaluate their performance for the Top-K sequential recommendation task. In addition, we propose novel Reciprocal Rank Aggregation (RRA) and Relevance Aggregation (RA) generation strategies based on multi-sequence generation with temperature sampling and subsequent aggregation. Experiments on diverse datasets give valuable insights regarding commonly used strategies' applicability and show that suggested approaches improve performance on longer time horizons compared to widely-used Top-K prediction approach and single-sequence autoregressive generation strategies.

dataset, recommendation, sequence, (12 more...)

arXiv.org Artificial Intelligence

2409.1773

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.05)
Asia > Russia (0.05)
North America > Canada > Nova Scotia > Halifax Regional Municipality > Halifax (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

A Unified Approach to Inferring Chemical Compounds with the Desired Aqueous Solubility

Batool, Muniba, Azam, Naveed Ahmed, Zhu, Jianshen, Haraguchi, Kazuya, Zhao, Liang, Akutsu, Tatsuya

arXiv.org Artificial IntelligenceSep-6-2024

Aqueous solubility (AS) is a key physiochemical property that plays a crucial role in drug discovery and material design. We report a novel unified approach to predict and infer chemical compounds with the desired AS based on simple deterministic graph-theoretic descriptors, multiple linear regression (MLR) and mixed integer linear programming (MILP). Selected descriptors based on a forward stepwise procedure enabled the simplest regression model, MLR, to achieve significantly good prediction accuracy compared to the existing approaches, achieving the accuracy in the range [0.7191, 0.9377] for 29 diverse datasets. By simulating these descriptors and learning models as MILPs, we inferred mathematically exact and optimal compounds with the desired AS, prescribed structures, and up to 50 non-hydrogen atoms in a reasonable time range [6, 1204] seconds. These findings indicate a strong correlation between the simple graph-theoretic descriptors and the AS of compounds, potentially leading to a deeper understanding of their AS without relying on widely used complicated chemical descriptors and complex machine learning models that are computationally expensive, and therefore difficult to use for inference. An implementation of the proposed approach is available at https://github.com/ku-dml/mol-infer/tree/master/AqSol.

compound, dataset, descriptor, (15 more...)

arXiv.org Artificial Intelligence

2409.04301

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.05)
Asia > Pakistan (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Materials > Chemicals (0.87)
Health & Medicine > Pharmaceuticals & Biotechnology (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Backdoor defense, learnability and obfuscation

Christiano, Paul, Hilton, Jacob, Lecomte, Victor, Xu, Mark

arXiv.org Artificial IntelligenceSep-4-2024

We introduce a formal notion of defendability against backdoors using a game between an attacker and a defender. In this game, the attacker modifies a function to behave differently on a particular input known as the "trigger", while behaving the same almost everywhere else. The defender then attempts to detect the trigger at evaluation time. If the defender succeeds with high enough probability, then the function class is said to be defendable. The key constraint on the attacker that makes defense possible is that the attacker's strategy must work for a randomly-chosen trigger. Our definition is simple and does not explicitly mention learning, yet we demonstrate that it is closely connected to learnability. In the computationally unbounded setting, we use a voting algorithm of Hanneke et al. (2022) to show that defendability is essentially determined by the VC dimension of the function class, in much the same way as PAC learnability. In the computationally bounded setting, we use a similar argument to show that efficient PAC learnability implies efficient defendability, but not conversely. On the other hand, we use indistinguishability obfuscation to show that the class of polynomial size circuits is not efficiently defendable. Finally, we present polynomial size decision trees as a natural example for which defense is strictly easier than learning. Thus, we identify efficient defendability as a notable intermediate concept in between efficient learnability and obfuscation.

detection strategy, probability, representation class, (14 more...)

arXiv.org Artificial Intelligence

2409.03077

Country: Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Add feedback

Prediction strategies without loss Rina Panigrahy Stanford University Microsoft Research Silicon Valley Stanford, CA

Neural Information Processing SystemsMar-14-2024, 22:42:30 GMT

Consider a sequence of bits where we are trying to predict the next bit from the previous bits. Assume we are allowed to say'predict 0' or'predict 1', and our payoff is +1 if the prediction is correct and 1 otherwise. We will say that at each point in time the loss of an algorithm is the number of wrong predictions minus the number of right predictions so far. In this paper we are interested in algorithms that have essentially zero (expected) loss over any string at any point in time and yet have small regret with respect to always predicting 0 or always predicting 1.

algorithm, payoff, sequence, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Stanford (0.40)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > Santa Clara County > Mountain View (0.04)
Europe > Czechia > Prague (0.04)

Industry: Information Technology (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.47)

Add feedback

Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations

Dreyer, Maximilian, Achtibat, Reduan, Samek, Wojciech, Lapuschkin, Sebastian

arXiv.org Artificial IntelligenceNov-28-2023

Ensuring both transparency and safety is critical when deploying Deep Neural Networks (DNNs) in high-risk applications, such as medicine. The field of explainable AI (XAI) has proposed various methods to comprehend the decision-making processes of opaque DNNs. However, only few XAI methods are suitable of ensuring safety in practice as they heavily rely on repeated labor-intensive and possibly biased human assessment. In this work, we present a novel post-hoc concept-based XAI framework that conveys besides instance-wise (local) also class-wise (global) decision-making strategies via prototypes. What sets our approach apart is the combination of local and global strategies, enabling a clearer understanding of the (dis-)similarities in model decisions compared to the expected (prototypical) concept use, ultimately reducing the dependence on human long-term assessment. Quantifying the deviation from prototypical behavior not only allows to associate predictions with specific model sub-strategies but also to detect outlier behavior. As such, our approach constitutes an intuitive and explainable tool for model validation. We demonstrate the effectiveness of our approach in identifying out-of-distribution samples, spurious model behavior and data quality issues across three datasets (ImageNet, CUB-200, and CIFAR-10) utilizing VGG, ResNet, and EfficientNet architectures. Code is available on https://github.com/maxdreyer/pcx.

prediction, prototype, relevance, (16 more...)

arXiv.org Artificial Intelligence

2311.16681

Country:

Europe > Germany > Berlin (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > France (0.04)

Genre: Research Report (1.00)

Industry:

Transportation (1.00)
Government (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Extrapolation-based Prediction-Correction Methods for Time-varying Convex Optimization

Bastianello, Nicola, Carli, Ruggero, Simonetto, Andrea

arXiv.org Artificial IntelligenceMay-4-2023

In this paper, we focus on the solution of online optimization problems that arise often in signal processing and machine learning, in which we have access to streaming sources of data. We discuss algorithms for online optimization based on the prediction-correction paradigm, both in the primal and dual space. In particular, we leverage the typical regularized least-squares structure appearing in many signal processing problems to propose a novel and tailored prediction strategy, which we call extrapolation-based. By using tools from operator theory, we then analyze the convergence of the proposed methods as applied both to primal and dual problems, deriving an explicit bound for the tracking error, that is, the distance from the time-varying optimal solution. We further discuss the empirical performance of the algorithm when applied to signal processing, machine learning, and robotics problems.

artificial intelligence, machine learning, prediction, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.sigpro.2023.109089

2004.11709

Country:

Europe > Netherlands > South Holland > Dordrecht (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Sweden (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback